108 research outputs found
LoGAN: Generating Logos with a Generative Adversarial Neural Network Conditioned on color
Designing a logo is a long, complicated, and expensive process for any
designer. However, recent advancements in generative algorithms provide models
that could offer a possible solution. Logos are multi-modal, have very few
categorical properties, and do not have a continuous latent space. Yet,
conditional generative adversarial networks can be used to generate logos that
could help designers in their creative process. We propose LoGAN: an improved
auxiliary classifier Wasserstein generative adversarial neural network (with
gradient penalty) that is able to generate logos conditioned on twelve
different colors. In 768 generated instances (12 classes and 64 logos per
class), when looking at the most prominent color, the conditional generation
part of the model has an overall precision and recall of 0.8 and 0.7
respectively. LoGAN's results offer a first glance at how artificial
intelligence can be used to assist designers in their creative process and open
promising future directions, such as including more descriptive labels which
will provide a more exhaustive and easy-to-use system.Comment: 6 page, ICMLA1
A retrieval-based dialogue system utilizing utterance and context embeddings
Finding semantically rich and computer-understandable representations for
textual dialogues, utterances and words is crucial for dialogue systems (or
conversational agents), as their performance mostly depends on understanding
the context of conversations. Recent research aims at finding distributed
vector representations (embeddings) for words, such that semantically similar
words are relatively close within the vector-space. Encoding the "meaning" of
text into vectors is a current trend, and text can range from words, phrases
and documents to actual human-to-human conversations. In recent research
approaches, responses have been generated utilizing a decoder architecture,
given the vector representation of the current conversation. In this paper, the
utilization of embeddings for answer retrieval is explored by using
Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor
(ANN) model, to find similar conversations in a corpus and rank possible
candidates. Experimental results on the well-known Ubuntu Corpus (in English)
and a customer service chat dataset (in Dutch) show that, in combination with a
candidate selection method, retrieval-based approaches outperform generative
ones and reveal promising future research directions towards the usability of
such a system.Comment: A shorter version is accepted at ICMLA2017 conference;
acknowledgement added; typos correcte
Massive Open Online Courses Temporal Profiling for Dropout Prediction
Massive Open Online Courses (MOOCs) are attracting the attention of people
all over the world. Regardless the platform, numbers of registrants for online
courses are impressive but in the same time, completion rates are
disappointing. Understanding the mechanisms of dropping out based on the
learner profile arises as a crucial task in MOOCs, since it will allow
intervening at the right moment in order to assist the learner in completing
the course. In this paper, the dropout behaviour of learners in a MOOC is
thoroughly studied by first extracting features that describe the behavior of
learners within the course and then by comparing three classifiers (Logistic
Regression, Random Forest and AdaBoost) in two tasks: predicting which users
will have dropped out by a certain week and predicting which users will drop
out on a specific week. The former has showed to be considerably easier, with
all three classifiers performing equally well. However, the accuracy for the
second task is lower, and Logistic Regression tends to perform slightly better
than the other two algorithms. We found that features that reflect an active
attitude of the user towards the MOOC, such as submitting their assignment,
posting on the Forum and filling their Profile, are strong indicators of
persistence.Comment: 8 pages, ICTAI1
Accumulated Gradient Normalization
This work addresses the instability in asynchronous data parallel
optimization. It does so by introducing a novel distributed optimizer which is
able to efficiently optimize a centralized model under communication
constraints. The optimizer achieves this by pushing a normalized sequence of
first-order gradients to a parameter server. This implies that the magnitude of
a worker delta is smaller compared to an accumulated gradient, and provides a
better direction towards a minimum compared to first-order gradients, which in
turn also forces possible implicit momentum fluctuations to be more aligned
since we make the assumption that all workers contribute towards a single
minima. As a result, our approach mitigates the parameter staleness problem
more effectively since staleness in asynchrony induces (implicit) momentum, and
achieves a better convergence rate compared to other optimizers such as
asynchronous EASGD and DynSGD, which we show empirically.Comment: 16 pages, 12 figures, ACML201
Adapting End-to-End Speech Recognition for Readable Subtitles
Automatic speech recognition (ASR) systems are primarily evaluated on
transcription accuracy. However, in some use cases such as subtitling, verbatim
transcription would reduce output readability given limited screen size and
reading time. Therefore, this work focuses on ASR with output compression, a
task challenging for supervised approaches due to the scarcity of training
data. We first investigate a cascaded system, where an unsupervised compression
model is used to post-edit the transcribed speech. We then compare several
methods of end-to-end speech recognition under output length constraints. The
experiments show that with limited data far less than needed for training a
model from scratch, we can adapt a Transformer-based ASR model to incorporate
both transcription and compression capabilities. Furthermore, the best
performance in terms of WER and ROUGE scores is achieved by explicitly modeling
the length constraints within the end-to-end ASR system.Comment: IWSLT 202
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
Encoder-decoder models provide a generic architecture for
sequence-to-sequence tasks such as speech recognition and translation. While
offline systems are often evaluated on quality metrics like word error rates
(WER) and BLEU, latency is also a crucial factor in many practical use-cases.
We propose three latency reduction techniques for chunk-based incremental
inference and evaluate their efficiency in terms of accuracy-latency trade-off.
On the 300-hour How2 dataset, we reduce latency by 83% to 0.8 second by
sacrificing 1% WER (6% rel.) compared to offline transcription. Although our
experiments use the Transformer, the hypothesis selection strategies are
applicable to other encoder-decoder models. To avoid expensive re-computation,
we use a unidirectionally-attending encoder. After an adaptation procedure to
partial sequences, the unidirectional model performs on-par with the original
model. We further show that our approach is also applicable to low-latency
speech translation. On How2 English-Portuguese speech translation, we reduce
latency to 0.7 second (-84% rel.) while incurring a loss of 2.4 BLEU points (5%
rel.) compared to the offline system
Towards Controlled Transformation of Sentiment in Sentences
An obstacle to the development of many natural language processing products
is the vast amount of training examples necessary to get satisfactory results.
The generation of these examples is often a tedious and time-consuming task.
This paper this paper proposes a method to transform the sentiment of sentences
in order to limit the work necessary to generate more training data. This means
that one sentence can be transformed to an opposite sentiment sentence and
should reduce by half the work required in the generation of text. The proposed
pipeline consists of a sentiment classifier with an attention mechanism to
highlight the short phrases that determine the sentiment of a sentence. Then,
these phrases are changed to phrases of the opposite sentiment using a baseline
model and an autoencoder approach. Experiments are run on both the separate
parts of the pipeline as well as on the end-to-end model. The sentiment
classifier is tested on its accuracy and is found to perform adequately. The
autoencoder is tested on how well it is able to change the sentiment of an
encoded phrase and it was found that such a task is possible. We use human
evaluation to judge the performance of the full (end-to-end) pipeline and that
reveals that a model using word vectors outperforms the encoder model.
Numerical evaluation shows that a success rate of 54.7% is achieved on the
sentiment change.Comment: Accepted at ICAART 2019, 8 page
Towards Controlled Transformation of Sentiment in Sentences
An obstacle to the development of many natural language processing products
is the vast amount of training examples necessary to get satisfactory results.
The generation of these examples is often a tedious and time-consuming task.
This paper this paper proposes a method to transform the sentiment of sentences
in order to limit the work necessary to generate more training data. This means
that one sentence can be transformed to an opposite sentiment sentence and
should reduce by half the work required in the generation of text. The proposed
pipeline consists of a sentiment classifier with an attention mechanism to
highlight the short phrases that determine the sentiment of a sentence. Then,
these phrases are changed to phrases of the opposite sentiment using a baseline
model and an autoencoder approach. Experiments are run on both the separate
parts of the pipeline as well as on the end-to-end model. The sentiment
classifier is tested on its accuracy and is found to perform adequately. The
autoencoder is tested on how well it is able to change the sentiment of an
encoded phrase and it was found that such a task is possible. We use human
evaluation to judge the performance of the full (end-to-end) pipeline and that
reveals that a model using word vectors outperforms the encoder model.
Numerical evaluation shows that a success rate of 54.7% is achieved on the
sentiment change.Comment: Accepted at ICAART 2019, 8 page
Exploring the context of recurrent neural network based conversational agents
Conversational agents have begun to rise both in the academic (in terms of
research) and commercial (in terms of applications) world. This paper
investigates the task of building a non-goal driven conversational agent, using
neural network generative models and analyzes how the conversation context is
handled. It compares a simpler Encoder-Decoder with a Hierarchical Recurrent
Encoder-Decoder architecture, which includes an additional module to model the
context of the conversation using previous utterances information. We found
that the hierarchical model was able to extract relevant context information
and include them in the generation of the output. However, it performed worse
(35-40%) than the simple Encoder-Decoder model regarding both grammatically
correct output and meaningful response. Despite these results, experiments
demonstrate how conversations about similar topics appear close to each other
in the context space due to the increased frequency of specific topic-related
words, thus leaving promising directions for future research and how the
context of a conversation can be exploited.Comment: Accepted at ICAART 2019, 10 page
- …